34 research outputs found

    Computing Palindromes on a Trie in Linear Time

    Get PDF

    Shortest Unique Substring Queries on Run-Length Encoded Strings

    Get PDF
    We consider the problem of answering shortest unique substring (SUS) queries on run-length encoded strings. For a string S, a unique substring u = S[i..j] is said to be a shortest unique substring (SUS) of S containing an interval [s, t] (i j\u27-i\u27, S[i\u27..j\u27] occurs at least twice in S. Given a run-length encoding of size m of a string of length N, we show that we can construct a data structure of size O(m+pi_s(N, m)) in O(m log m + pi_c(N, m)) time such that queries can be answered in O(pi_q(N, m) + k) time, where k is the size of the output (the number of SUSs), and pi_s(N,m), pi_c(N,m), pi_q(N,m) are, respectively, the size, construction time, and query time for a predecessor/successor query data structure of m elements for the universe of [1,N]. Using the data structure by Beam and Fich (JCSS 2002), this results in a data structure of O(m) space that is constructed in O(m log m) time, and answers queries in O(sqrt(log m/loglog m)+k) time

    Tight Bounds on the Maximum Number of Shortest Unique Substrings

    Get PDF
    A substring Q of a string S is called a shortest unique substring (SUS) for interval [s,t] in S, if Q occurs exactly once in S, this occurrence of Q contains interval [s,t], and every substring of S which contains interval [s,t] and is shorter than Q occurs at least twice in S. The SUS problem is, given a string S, to preprocess S so that for any subsequent query interval [s,t] all the SUSs for interval [s,t] can be answered quickly. When s = t, we call the SUSs for [s, t] as point SUSs, and when s <= t, we call the SUSs for [s, t] as interval SUSs. There exist optimal O(n)-time preprocessing scheme which answers queries in optimal O(k) time for both point and interval SUSs, where n is the length of S and k is the number of outputs for a given query. In this paper, we reveal structural, combinatorial properties underlying the SUS problem: Namely, we show that the number of intervals in S that correspond to point SUSs for all query positions in S is less than 1.5n, and show that this is a matching upper and lower bound. Also, we consider the maximum number of intervals in S that correspond to interval SUSs for all query intervals in S

    Sliding suffix trees simplified

    Full text link
    Sliding suffix trees (Fiala & Greene, 1989) for an input text TT over an alphabet of size σ\sigma and a sliding window WW of TT can be maintained in O(Tlogσ)O(|T| \log \sigma) time and O(W)O(|W|) space. The two previous approaches that achieve this can be categorized into the credit-based approach of Fiala and Greene (1989) and Larsson (1996, 1999), or the batch-based approach proposed by Senft (2005). Brodnik and Jekovec (2018) showed that the sliding suffix tree can be supplemented with leaf pointers in order to find all occurrences of an online query pattern in the current window, and that leaf pointers can be maintained by credit-based arguments as well. The main difficulty in the credit-based approach is in the maintenance of index-pairs that represent each edge. In this paper, we show that valid edge index-pairs can be derived in constant time from leaf pointers, thus reducing the maintenance of edge index-pairs to the maintenance of leaf pointers. We further propose a new simple method which maintains leaf pointers without using credit-based arguments. Our algorithm and proof of correctness are much simpler compared to the credit-based approach, whose analyses were initially flawed (Senft 2005).Comment: 12 pages + 5 pages of appendix. 18 figures in tota

    Finding Top-k Longest Palindromes in Substrings

    Full text link
    Palindromes are strings that read the same forward and backward. Problems of computing palindromic structures in strings have been studied for many years with a motivation of their application to biology. The longest palindrome problem is one of the most important and classical problems regarding palindromic structures, that is, to compute the longest palindrome appearing in a string TT of length nn. The problem can be solved in O(n)O(n) time by the famous algorithm of Manacher [Journal of the ACM, 1975]. This paper generalizes the longest palindrome problem to the problem of finding top-kk longest palindromes in an arbitrary substring, including the input string TT itself. The internal top-kk longest palindrome query is, given a substring T[i..j]T[i..j] of TT and a positive integer kk as a query, to compute the top-kk longest palindromes appearing in T[i..j]T[i.. j]. This paper proposes a linear-size data structure that can answer internal top-kk longest palindromes query in optimal O(k)O(k) time. Also, given the input string TT, our data structure can be constructed in O(nlogn)O(n\log n) time. For k=1k = 1, the construction time is reduced to O(n)O(n)

    String Sanitization Under Edit Distance: Improved and Generalized

    Get PDF
    International audienceLet W be a string of length n over an alphabet Σ, k be a positive integer, and S be a set of length-k substrings of W. The ETFS problem asks us to construct a string X ED such that: (i) no string of S occurs in X ED ; (ii) the order of all other length-k substrings over Σ is the same in W and in X ED ; and (iii) X ED has minimal edit distance to W. When W represents an individual's data and S represents a set of confidential patterns, the ETFS problem asks for transforming W to preserve its privacy and its utility [Bernardini et al., ECML PKDD 2019]. ETFS can be solved in O(n 2 k) time [Bernardini et al., CPM 2020]. The same paper shows that ETFS cannot be solved in O(n 2−δ) time, for any δ > 0, unless the Strong Exponential Time Hypothesis (SETH) is false. Our main results can be summarized as follows: • An O(n 2 log 2 k)-time algorithm to solve ETFS. • An O(n 2 log 2 n)-time algorithm to solve AETFS, a generalization of ETFS in which the elements of S can have arbitrary lengths
    corecore